Storing, reasoning, and querying OPM-compliant scientific workflow provenance using relational databases
نویسندگان
چکیده
Provenance, the metadata that records the derivation history of scientific results, is essential in scientific workflows to support the reproducibility of scientific discovery, result interpretation, and problem diagnosis. To promote and facilitate interoperability among heterogeneous provenance systems, the Open Provenance Model (OPM) was first proposed in 2008 and since then has played an important role in the community. In this paper, we present OPMProv, a relational database-based scientific workflow provenance system, that is compliant with OPM (v1.1). Our main contributions are: (i) we design an entity–relationship diagram for OPM and translate it into a relational database schema for the storage of provenance; (ii) we show that provenance reasoning defined in OPM (v1.1) can be sufficiently supported by OPMProv using recursive views and SQL queries alone without any additional reasoning engine. Experiments are conducted to evaluate the performance of OPMProv in data insertion and provenance querying. A case study is performed, demonstrating that OPMProv can answer all except two queries out of the 16 queries defined in the Third Provenance Challenge. © 2010 Elsevier B.V. All rights reserved.
منابع مشابه
Scientific Workflow Provenance Metadata Management Using an RDBMS
Provenance management has become increasingly important to support scientific discovery reproducibility, result interpretation, and problem diagnosis in scientific workflow environments. This paper proposes an approach to provenance management that seamlessly integrates the interoperability, extensibility, and reasoning advantages of Semantic Web technologies with the storage and querying power...
متن کاملA Graph Model of Data and Workflow Provenance
Provenance has been studied extensively in both database and workflow management systems, so far with little convergence of definitions or models. Provenance in databases has generally been defined for relational or complex object data, by propagating fine-grained annotations or algebraic expressions from the input to the output. This kind of provenance has been found useful in other areas of c...
متن کاملSpecial Issue: the Third Provenance Challenge on Using the Open Provenance Model for Interoperability
1 Abstract The third provenance challenge was organized to evaluate the efficacy of the Open Provenance Model (OPM) in representing and sharing provenance with the goal of improving the specification. A data loading scientific workflow that ingests data files into a relational database for the Pan-STARRS sky survey project was selected as a candidate for collecting provenance. Challenge partici...
متن کاملScientific Workflow Provenance Metadata Management Using an RDBMS-based RDF Store
Provenance management has become increasingly important to support scientific discovery reproducibility, result interpretation, and problem diagnosis in scientific workflow environments. This paper proposes an approach to provenance management that seamlessly integrates the interoperability, extensibility, and reasoning advantages of Semantic Web technologies with the storage and querying power...
متن کاملRDFProv: A relational RDF store for querying and managing scientific workflow provenance
Article history: Received 12 October 2008 Received in revised form 8 March 2010 Accepted 11 March 2010 Available online 23 March 2010 Provenance metadata has become increasingly important to support scientific discovery reproducibility, result interpretation, and problem diagnosis in scientific workflow environments. The provenance management problem concerns the efficiency and effectiveness of...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Future Generation Comp. Syst.
دوره 27 شماره
صفحات -
تاریخ انتشار 2011